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Elpasolite is the predominant quaternary crystal structure (AlNaK 2 F 6 prototype) reported in the 
Inorganic Crystal Structure Database. We have developed a machine learning model to calculate 
density functional theory quality formation energies of all ~2 M pristine ABC 2 D 6 elpasolite crystals 
which can be made up from main-group elements (up to bismuth). Our model’s accuracy can 
be improved systematically, reaching 0.1 eV/atom for a training set consisting of 10 k crystals. 
Important bonding trends are revealed, fluoride is best suited to ht the coordination of the D 
site which lowers the formation energy whereas the opposite is found for carbon. The bonding 
contribution of elements A and B is very small on average. Low formation energies result from A 
and B being late elements from group (II), C being a late (I) element, and D being fluoride. Out of 
2M crystals, 90 unique structures are predicted to be on the convex hull—among which NFADCae, 
with peculiar stoichiometry and a negative atomic oxidation state for Al. 


Elpasolite (AlNaK 2 F 6 ) is a glassy, transparent, lus¬ 
ter, colorless, and soft quaternary crystal in the Fm3m 
space group which can be found in the Rocky Moun¬ 
tains, Virginia, or the Apennines. The elpasolite crystal 
structure (See Fig. is not uncommon, it is the most 
abundant prototype in the Inorganic Crystal Structure 
Database m- Some elpasolites emit light when exposed 
to ionic radiation, which makes them interesting material 
candidates for scintillator devices mil]. One could use 
first-principle methods such as density functional theory 
(DFT) Oini to computationally predict the existence and 
basic properties of every elpasolite. Unfortunately, even 
when considering crystals composed of only main group 
elements (columns I to VIII) the sheer number of the 
~2 M possible combinations makes DFT based screening 
challenging—if not prohibitive. Recently, computation¬ 
ally efficient machine learning (ML) models were intro¬ 
duced for predicting molecular properties with the same 
accuracy as DFT umi. Requiring only milliseconds per 
prediction, they represent an attractive alternative when 
it comes to the combinatorial screening of millions of 
crystals. While some ML model variants have already 
been proposed for solids HSH, a generally applicable 
ML-scheme with DFT accuracy of formation energies is 
still amiss. 

In this Letter we introduce a newly developed ML 
model which we use to investigate the formation ener¬ 
gies of all ~2M elpasolites made from all main-group 
elements up to Bi. Resulting estimates enable the iden¬ 
tification of new elemental order of descending elpasolite 
formation energy, crystals with peculiar atomic charges, 
250 elpasolites with lowest formation energies, as well as 
128 new crystal structures predicted to lie on the convex 
hull among which NFAl 2 Ca 6 , an elpasolite with unusual 
composition and atomic charge. The ML model achieves 


the same, or better, accuracy to DFT as DFT to exper¬ 
imental data and can be generalized to any crystalline 
material. 

The ML-model is based on kernel ridge regression Hi- 
Ill] which maps the non-linear energy difference between 
the actual DFT energy and an inexpensive approximate 
baseline model into a linear feature space HSj. More 
specifically, we construct a ML model of the energy dif¬ 
ference to the sum of static, atom-type dependent, atomic 
energy contributions e/t, obtained through fitting of each 
atom type t in all main group elements up to Bi. The 
energy-predicting function is a sum of weighted exponen¬ 
tials in similarity d between query and training crystal, 

N' N 

E{x) + (I) 

I i 

where N' is the number of atoms/unit cell (10 in the case 
of elpasolites), and the second sum runs over all N train¬ 
ing instances, ai are the weights obtained through linear 
regression, and a is the global exponential width, regu¬ 
lating the length scale of the problem. The similarity di 
is the Manhattan distance, i.e., di = ||x —While 
various crystal structure representations have previously 
been proposed PHIl HU, we have found the following 
representation to yield superior performance: x is a n x 2 
tuple that encodes any stoichiometry within a given crys¬ 
tal prototype. For quaternary (n = 4) elpasolites, each 
a:i _4 refers to the 4 representative sites, the atom type 
for each site is represented by its row (principal quantum 
number 2 to 6) and column (number of valence electrons) 
I to VIII in the periodic table, and sites are ordered ac¬ 
cording to the Wyckoff sequence of the crystal. As such, 
X implicitly represents the energy minimum structure for 
a system restricted to this prototype—without explicitly 
encoding precise coordinates, lattice constants, or other 
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FIG. 1. (a) Illustration of elpasolite crystal (AlNaK 2 F 6 

structure). The four-tuple x = (aii,..., 214 ) representation 
of atomic sites is specified, (b) Frequency of elements (de¬ 
fined by nuclear charge Z) for the three data sets studied, (c) 
Mean absolute out-of-sample prediction error as a function of 
training set size for the three data sets studied. Inset: Error 
distributions and DFT vs. ML scatter plots for three train¬ 
ing set sizes for the (I—VIII) data set. (d) Estimated mean 
energy contribution of each element to formation of any el¬ 
pasolite crystal. The color code reflects the new elemental 
elpasolite order, (e) Lowest 250 ML model predicted forma¬ 
tion energies of elpasolites in ascending order from (III—VI) 
(TOP) and (I-VIII) (MID and BOTTOM) data sets. Results 
in TOP and MID panel correspond to ML models trained on 
2000 examples, BOTTOM panel results correspond to a ML 
model trained on 10k crystals. Validating DFT energies are 
shown aside, (f) Distributions of absolute lowest possible to¬ 
tal oxidation states (LPTOS) in energies. Formulas indicate 
the lowest lying crystals. 


(approximate) solutions to Schrodinger’s equation. This 
representation is not restricted to the elpasolite structure, 
it can be used for any crystalline configuration: Below we 
also briefly discuss test results for small size ML models 
applied to ternary crystals. 

For training and evaluation, we have generated DFT 
formation energies for two data sets of elpasolites 
(for computational details see supplementary materials. 
Ref. fT71) . one small, (III—VI), made up from only 12 ele¬ 
ments, C, N, O, Al, Si, P, S, Ga, Ge, As, Sn, and Sb; and 
one large, (I—VIII), containing all main-group elements 
up to Bi. Since (III-VI) only comprise ~12k possible 
permutations, we have obtained the complete list of for¬ 
mation energies. (I—VIII) consists of 10 k structures. 


i.e. 0.5% of the total number of 2M possible crystals. 
The (I—VIII) data set has been generated through ran¬ 
dom selection of elpasolites while ensuring an unbiased 
composition. To verify that the ML model is general and 
not only restricted to elpasolites, we have also included a 
materials project [T5] dataset (MPD) consisting of ~0.5k 
ternary crystals in ThCr 2 Si 2 (I4/mmm) prototype and 
made up of 84 different atom types. The distribution 
of the chemical elements in the data sets are shown in 
Fig.Qb). Numerical results on display in Fig. He) indi¬ 
cate systematic improvement of the predictive accuracy 
of the ML model with increasing training set size, for all 
three datasets. The inset details normally distributed er¬ 
rors and scatter plots which systematically improve with 
training set size for the models trained on the (I—VIII) 
datasets. For a lOk training set, the ML model reaches 
a MAE of 0.1 eV/atom compared to reference, i.e. semi¬ 
local DFT. DFT, in turn, has an estimated MAE of ^0.19 
eV/atom compared to experiments on heats of formation 
for general chemistries with filled d-shells m- Eor tran¬ 
sition metal oxides and elemental solids other groups re¬ 
port DET errors on the order of 0.1 eV/atom [501 [S]- 
The converging performance for training on nearly all 
crystals of the (III-VI) data set suggests that our crystal 
representation of elpasolite structures Fig.j^a) accounts 
for all necessary degrees of freedom. While errors decay 
systematically and linearly on a log-log plot, the learning 
rate levels off as N approaches the 100%, i.e. 10k. This is 
due to the employed relaxation convergence threshold of 
±10 meV/atom in the DFT calculations. Any inductive 
model must fail to go below this level, and only numeri¬ 
cally more precise reference numbers would mitigate this 
trend. In all validation tests dealing with energy predic¬ 
tions for random out-of-sample crystals, the ML model 
performance meets the expectations set in Fig. [^c). For 
example, drawing 100 crystals at random from (III-VI) 
and (I—VIII) datasets ML models perform as expected 
when compared to the result from validating DFT calcu¬ 
lations (cf. Fig. S3 [H]). (III-VI) and (I—VIII) reaches 
a MAE of 0.1 eV/atom at roughly 2.5 % and 0.5 % of the 
total number of crystals respectively, suggesting that the 
machine ’’efficiency” increases with number of possible 
combinations. We note however that two observations of 
the same structure is not sufficient to see any trends on 
how much training data is needed. 

Having established the performance of the ML model, 
we have subsequently used the 10 k training set model 
(I—VIII) for investigation of the elpasolite universe. Es¬ 
timated formation energies for all 2 M elpasolites are fea¬ 
tured in Fig. The formation energies are clearly domi¬ 
nated by the chemical identity of position 4, followed by 
position 3 but according to a different pattern. Chemical 
identity at position 1 and 2 has the smallest influence and 
very similar impact (also illustrated in Fig. SI of Ref.ITTl) 
Due to the effective degeneracy of positions 1 and 2, all 
inner matrices in Fig. [^appear largely symmetric. Figure 
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FIG. 2. Formation energies for all 2 M elpasolites made up of all main-group elements up to Bi predicted by the 10 k ML-model. 
The outer vertical and horizontal axis correspond to xa and xa symmetry position, respectively. Inner vertical and horizontal 
axis correspond to X 2 and xi symmetry position, respectively. Elemental sequence follows the elpasolite order of Fig. Bd). 
White pixels correspond to subspaces of ternary, binary, or elementary non-elpasolite crystals. 


Bd) shows the average contribution of each element to 
the formation energies estimated by the 10 k ML model. 
These average contributions per element are used to or¬ 
der the elements in Fig. B to yield the smoothest elpa¬ 
solite map. Arranging elements by their nuclear charge, 
or by their Pettifor order [^, results in a much more 
oscillatory map or stripe-like pattern due to underlying 
periodicities (cf. Ref. [T7|). This elpasolite error is dom¬ 
inated by the element identity in position 4 (compare 
Figure Bd) to Fig. SI of Ref. |T7]); its break-down is 
small as illustrated for pair-wise energy contributions in 
Fig. S5 of Ref. [m 

Figures B^) visualize the bonding emergent from the 
geometry and bond coordination of the elpasolite crystal 
structure (see also figures in supplementary materials). 
Fluorine and carbon are at the respective ends of the 
global scale of low and high formation energies. But also 
alkaline metals, alkaline earth metals, and oxygen con¬ 
tribute to lowering the formation energy. On average, 
the formation energies of elpasolites involving halogens, 


alkaline metals, noble gases increase as the periodic table 
is descended. The opposite holds for all other elements, 
except oxygen, boron, carbon and nitrogen, which all 
have a noticeably higher average formation energy than 
any other element. A saddle point can also be observed 
in the midst of the periodic table table as well as two 
valleys along the halogen and alkaline earth rows. Site- 
specific resolution indicates that fluorine fits best with 
the bond coordination of sites 1, 2, and 4, whereas the 
same does not apply to later halogens (not shown in the 
paper, see supplementary materials EZl). In contrast, as 
the element on site 3 goes down column II in the peri¬ 
odic table, the formation energy is successively lowered, 
with Ca, Sr, and Ba contributing more than any halogen 
atom. On sites 1 and 2, the formation energy gener¬ 
ally increases the most for heavy noble gases. On sites 
3 and 4, it is carbon, followed by neighboring B and N 
that increase the formation energy the most. The accu¬ 
racy of linear single atom energy models based on these 
scales, however, is not on par with the ML-model, and— 
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maybe more importantly—cannot be improved system¬ 
atically through increasing training set sizes but rather 
converges to a finite residual error. 

In order to achieve satisfying accuracy of ±0.1 
eV/atom for elpasolites, a relatively large training set 
of 10 k is needed. This is likely due to the sparsity of 
crystals at the opposite ends of the high and low for¬ 
mation energy spectrum; this results in a decreased pre¬ 
dictive ML model accuracy for crystals in these regions, 
which is demonstrated in Fig S6 in Ref. [T71 Neverthe¬ 
less, the 10 k ML model readily identifies a larger set of 
lowest lying elpasolites for which the actual DFT min¬ 
ima can be obtained through subsequent DFT based 
screening. This is shown in Fig. [^e) where the 250 
crystals with the lowest ML predicted formation ener¬ 
gies are shown in ascending order (with further details 
on these systems in the supplementary mateirals. [T71 1 
Subsequent screening with DFT indicates the 26*^ crys¬ 
tal CaSrCs 2 F 6 (out of 2M) to be the global formation en¬ 
ergy minimum at —3.44 eV/atom, closely followed a near¬ 
degenerate isomer SrCaCs 2 F 6 . The DFT energies of the 
next two degenerate pairs CaSrRb 2 F 6 /SrCaRb 2 F 6 and 
CaBaCs 2 F 6 /BaCaCs 2 F 6 correspond to —3.41, and —3.39 
eV/atom, respectively. Overall, the elpasolites with the 
most favorable formation energies, ABC 2 D 6 , correspond 
to A and B being late elements from group (II), and C 
and D being a late element from group (I) and fluoride, 
respectively. Populating the four sites with elements from 
groups (II),(II),(I), and (VIII), respectively, differs from 
the experimentally established stoichiometry AlNaK 2 F 6 . 
In fact, the lowest DFT energy crystal with a group- 
(III) element is CsAlRb 2 F 6 (in 69*^ position) with —3.09 
eV/atom (ML energy: —2.96 eV/atom, see supplemen¬ 
tary materials). 

We have also used our predictions to analyse atomic ox¬ 
idation states in elpasolites. In particular, we have found 
that roughly 6 % of the crystals with formation ener¬ 
gies below —1 eV/atom exhibit unusual atomic charges: 
They are low in energy despite the fact that no combi¬ 
nation of conventional atomic charges would result in a 
neutral system. In order to identify these crystals, we 
have used the absolute value of the lowest possible total 
oxidation state (LPTOS) that could possibly be realized 
using the list of typical atomic oxidation states on dis¬ 
play in Table III in Ref. |T71 The lowest lying crystals 
have a LPTOS of 0 (—3 to —3.44 eV/atom formation en¬ 
ergies). However, already at —3 eV/atom crystals with 
LPTOS of 2 or 1 start to occur. At formation energies 
of ~ —1.25 eV/atom and higher, the number of crystals 
with non-zero LPTOS increases rapidly, with LPTOS as 
high as 12. Corresponding crystal frequency distributions 
are shown in Fig. j^e), along with formulas for the mu¬ 
tually lowest lying crystals. Interestingly, the number of 
crystals with zero LPTOS increases monotonically with 
formation energy, while for nonzero LPTOS crystals the 
distribution is oscillatory. 


To demonstrate the usefulness of our ML model we 
have applied it to identify thermodynamically stable el¬ 
pasolites. To this end, we first selected all those 274,213 
elpasolites with negative ML formation energies, and 
without rare gas elements. Since stability depends on the 
energy difference to any possible polymorph or compet¬ 
ing segregated phases |5S1121] , we have queried available 
DFT formation energies stored in the Materials Project 
(MP) [T5]. Some elpasolites, such as the archetypi¬ 
cal AlNaK 2 F 6 , are already stored in the MP. Using 
DFT results stored in the MP for competing quaternary, 
ternary and binary phases, we have constructed phase 
diagrams [24] for all 274,213 crystals. For each crystal, 
there are on average ^12 competing phases stored in the 
MP; there is only one combination of elements (Cs-Li- 
Na-Rb) for which no binary or ternary competing phases 
have been stored. For 2133 out of the 274,213 crystals, 
the resulting stabilization energy is below the known con¬ 
vex hull of stability (for more details, see Ref. HT]). Sub¬ 
sequent validation using DFT instead of ML confirms 
128 out of them to be stable. Out of these 38 are poly¬ 
morphs (ABC 2 D 6 vs. BAC 2 D 6 ) resulting in 90 unique 
stoichiometries. Such a reduction (274,213 —>■ 90) in num¬ 
ber of crystal candidates is to be expected since sorting 
crystals by ML energies being lower than the convex hull 
systematically favors those with negative ML formation 
energy errors. We note that this does not amount to 
proof that the 90 crystals are stable: The MP database is 
not exhaustive. This implies that other new competing 
phases and materials, with even stronger stabilization, 
might still be discovered in the future. Also, the intrin¬ 
sic error of the employed DFT method within the MP 
might still alter the outcome with respect to experiment. 
As such, the 90 new elpasolite DFT energies represent 
new upper bounds on the convex hull at the correspond¬ 
ing compositions. They have been submitted to the MP 
database, and most of them have been made available for 
further studies (See Table V in Ref. [T7] for the list of the 
90 structures). 

Among these elpasolites, metals, semiconductors and 
insulators are roughly distributed equally. All structures 
with an earth alkaline metal in crystal position 4 have 
a low or zero band-gap. We have noted an intriguing 
yet stable structure of a conductor, NFAl 2 Ca 6 (MP ID: 
mp-989399, # 20 in Table V in Ref. ITTll with Ca at po¬ 
sition 4, instead of F or Cl. Bader charge analysis [23- 
ET] (Tabled indicates an exotic negative oxidation state 
for A1 (-II), previously only reported for A1 in substan¬ 
tially larger Zintl phase unit cells (Sri 4 [Al 4 ] 2 Ge 3 ) [2H]- 
Since Bader charges sometimes yield non intuitive re¬ 
sults ]29| ]30] , calculated Hirshfeld ]3T] and Voronoi de¬ 
formation density ]29l ]32] charges (Table |T| confirm the 
negative oxidation state, albeit reduced by one unit (-1). 
The calculated phonon spectra of NFAl 2 Ca 6 also indicate 
stability m- 

In conclusion, we have developed and used ML-models 
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TABLE I. Calculated atomic charges in NFAbCae elpasolite 
using different methods (obtained using SIESTA[33]L 


Method 

N 

F 

Al 

Ca 

Bader 

-2.00 

-0.98 

-2.13 

1.20 

Hirshfeld 

-0.63 

-0.36 

-1.05 

0.52 

Voronoi deformation density 

-0.81 

-0.29 

-1.13 

0.56 


of formation energies to investigate all possible elpasolites 
made up of main-group elements. We have presented nu¬ 
merical results for ~2 M formation energies. The ML- 
model is only implicitly dependent on spatial coordinates, 
through reference data used for training. No spatial co¬ 
ordinates are needed for new queries, yet for a training 
set of 10 k crystals the model reaches ±0.1 eV/atom— 
comparable to DFT accuracy for solids. The results have 
been used to identify the most strongly bound elpasolites 
as well as to investigate energy and bonding trends at 
crystal structure sites, leading to a new “elpasolite order” 
of elements, consistent with the bonding physics in the 
elpasolite crystal structure. We identified and added 128 
structures (90 unique stoichiometries) to the convex hull 
of the MP database. Charge analysis for the metallic el¬ 
pasolite NFAl 2 Ca 6 indicates a negative atomic oxidation 
state of Al. This outcome directly demonstrates that our 
method can be used for the discovery of stable as well 
as unconventional chemistries. Due to the low compu¬ 
tational cost of the ML model one can now also afford 
to remove human bias by considering also those struc¬ 
tures which previously would have been excluded due to 
’’chemical intuition”. Our results suggest that ML mod¬ 
els hold great promise for the computational screening 
of polymorphs, other crystal structure symmetries, solid 
mixtures, phase transitions, or defects at unprecedented 
rate and extent. Other crystal properties than energies 
could also be considered. 
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